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Abstract. In this article, we review quantile models with endogeneity. We focus on 
models that achieve identification through the use of instrumental variables and discuss 
conditions under which partial and point identification are obtained. We discuss key con- 
ditions, which include monotonicity and full-rank-type conditions, in detail. In providing 
this review, we update the identification results of Chernozhukov and Hansen (2005). We 
illustrate the modeling assumptions through economically motivated examples. We also 
briefly review the literature on estimation and inference. 
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1. Introduction 

Quantile regression is a tool for estimating conditional quantile models that has been 
used in many empirical studies and has been studied extensively in theoretical economet- 
rics; see Koenker and Bassett (1978) and Koenker (2005). One of quantile regression's 
most appealing features is its ability to estimate quantile-specific effects that describe 
the impact of covariates not only on the center but also on the tails of the conditional 
outcome distribution. While the central effects, such as the mean effect obtained through 
conditional mean regression, provide interesting summary statistics of the impact of a 
covariate, they fail to describe the full distributional impact unless the conditioning vari- 
ables affect the central and the tail quantiles in the same way. In addition, researchers are 
interested in the impact of covariates on points other than the center of the conditional 
distribution in many cases. For example, in a study of the effectiveness of a job training 
program, the effect of training on the lower tail of the earnings distribution conditional 
on worker characteristics may be of more interest than the effect of training on the mean 
of the distribution. 

In observational studies, the variables of interest (e.g. education or prices) are often 
endogenous. Just as with the conventional linear model, endogeneity of covariates renders 
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the conventional quantile regression inconsistent for estimating the causal (structural) 
effects of covariates on the quantiles of economic outcomes. One approach to addressing 
this problem is to generalize the instrumental variables framework to allow for estimation 
of quantile models. In this paper, we review developments in instrumental variables 
approaches to modeling and estimating quantile treatment (structural) effects (QTE) in 
the presence of endogeneity. 

We focus our review on the modeling framework of Chernozhukov and Hansen (2005) 
which provides conditions for identification of the QTE without functional form assump- 
tions. The principal identifying assumption of the model is the imposition of conditions 
which restrict how rank variables (structural errors) may vary across treatment states. 
These conditions allow the use of instrumental variables to overcome the endogeneity 
problem and recover the true QTE. This framework also ties naturally to simultaneous 
equations models, corresponding to a structural simultaneous equation model with non- 
additive errors. Within this framework, estimation and inference procedures for linear 
quantile models have been developed by Chernozhukov and Hansen (2006), Chernozhukov 
and Hansen (2008), Chernozhukov, Hansen, and Jansson (2009), and Jun (2008); non- 
parametric estimation has been considered by Chernozhukov, Imbens, and Newey (2007), 
Horowitz and Lee (2007), and Gagliardini and Scaillet (2012); and inference with discrete 
outcomes has been explored by Chesher (2005). Moreover, the modeling framework pro- 
vides a foundation for other estimation methods based on IV median-independence and 
more general quantile-independence conditions as in Abadie (1997), Chernozhukov and 
Hong (2003), Chen, Linton, and Keilegom (2003), Hong and Tamer (2003), Honore and 
Hu (2004), and Sakata (2007). It is also important to note that the modeling framework 
we review can be used to study nonparametric identification of structural economic mod- 
els in cases where quantile effects are not necessarily the chief objects of interest. Berry 
and Haile (2010) provide an excellent example of this in the context of discrete choice 
models with endogeneity. 

We also briefly review other modeling approaches for quantile effects with endoge- 
nous covariates. Abadie, Angrist, and Imbens (2002) consider a QTE model for the 
sub-population of "compilers" which applies to binary endogenous variables with binary 
instruments. Imbens and Newey (2009), Chesher (2003), Lee (2007), and Koenker and 
Ma (2006) use models with triangular structures and show how control functions can be 
constructed and used to estimate structural objects of interest. While these models share 
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some features with the model of Chernozhukov and Hansen (2005), the three approaches 
are non-nested in general. 

Quantile models with endogeneity have been used in many empirical studies in econom- 
ics. See Abadie, Angrist, and Imbens (2002); Chernozhukov and Hansen (2004); Hausman 
and Sidak (2004); Forbes (2008); Eren (2009); Kostov (2009); Maynard and Qiu (2009); 
Wehby, Murran, Castilla, Lopez-Camelo, and Ohsfeldt (2009); Lamarche (2011); Autor, 
Houseman, and Kerr (2012); and Somainiy (2012) among others. We do not provide a 
review of empirical applications but note these papers provide further discussion of how 
the instrumental variables quantile model relates to their specific framework and illustrate 
some of the rich effects that one can estimate using quantile methods. 

2. An IV Quantile Model 

In this section, we present an instrumental variable model for quantile treatment effects 
(QTE), its main econometric implication, and the principal identification result. 

2.1. Framework. Our model is developed within the conventional potential (latent) out- 
come framework, e.g. Heckman and Robb (1986). Potential real- valued outcomes which 
vary among individuals or observational units are indexed against potential treatment 
states d £D and denoted Yd. The potential outcomes {Yd} are latent because, given the 
selected treatment D, the observed outcome for each individual or observational unit is 
only one component 

Y :=Y D 

of the potential outcomes vector {Yd}. Throughout the paper, capital letters denote 
random variables, and lower case letters denote the potential values they may take. We 
do not explicitly state various technical measurability assumptions as these can be deduced 
from the context [] 

The objective of causal or structural analysis is to learn about features of the distri- 
butions of potential outcomes Yd. Of primary interest to us are the r-th quantiles of 
potential outcomes under various treatments d, conditional on observed characteristics 
X = x, denoted as 

q(d,x,r). 

1 For simplicity, we could assume that d takes on a countable set of values T> or make separability 
assumptions which imply that the stochastic process {Yd, d G T>} is defined from its definition over a 
countable subset T>q C T>. See van der Vaart and Wellner (1996). 
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We will refer to the function q(d, x, r) as the quantile treatment response (QTR) function. 
We are also interested in the quantile treatment effects (QTE), defined as 

q(d 1 ,x,r) - q(d ,x,r), 

that summarize the differences in the impact of treatments on the quantiles of potential 
outcomes (Lehmann (1974), Doksum (1974)). 

Typically, the realized treatment D is selected in relation to potential outcomes, induc- 
ing endogeneity. This endogeneity makes the conventional quantile regression of observed 
Y on observed D, which relies upon the restriction 

P[Y < 6(D, X, r) \X, D] = t a.s., 

inappropriate for measuring q(d, x, r) and the QTE. Indeed the function 6(d, x, r) solving 
these equations will not be equal to q(d, x, r) under endogeneity. The model presented 
next states conditions under which we can identify and estimate the quantiles of latent 
outcomes through the use of instruments Z that affect D but are independent of potential 
outcomes and the nonlinear quantile-type conditional moment restrictions 

P[Y ^ q(D,X,r)\X,Z] = r a.s. 

2.2. The Instrumental Quantile Treatment Effects (IVQT) Model. Having con- 
ditioned on the observed characteristics X — x, each latent outcome Y d can be related to 
its quantile function q(d, 

Y d = q{d,x,U d ), where U d ~ 17(0, 1) (2.1) 

is the structural error term. We note that representation (12. 1 p is essential to what follows. 

The structural error U d is responsible for heterogeneity of potential outcomes among 
individuals with the same observed characteristics x. This error term determines the 
relative ranking of observationally equivalent individuals in the distribution of potential 
outcomes given the individuals' observed characteristics, and thus we refer to U d as the 
rank variable. Since U d drives differences in observationally equivalent individuals, one 
may think of U d as representing some unobserved characteristic, e.g. ability or prone- 
nessj§ This interpretation makes quantile analysis an interesting tool for describing and 

2 This follows by Fisher-Skorohod representation of random variables which states that given a collection 
of variables {Cdj, each variable Q can be represented as Cd = q(d, Ud), for some Ud ~ U(0, 1), cf. Durrett 
(1996), where q(d, r) denotes the r-quantile of variable Q. 

3 Doksum (1974) uses the term proneness as in "prone to learn fast" or "prone to grow taller". 
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learning the structure of heterogeneous treatment effects and accounting for unobserved 
heterogeneity; see Doksum (1974), Heckman and Smith (1997), and Koenker (2005). 

For example, consider a returns-to-training model, where Yd 8 are potential earnings 
under different training levels d, and q(d,x,r) is the conditional earnings function which 
describes how an individual having training d, characteristics x, and the latent "ability" r 
is rewarded by the labor market. The earnings function may be different for different levels 
of r, implying heterogeneous effects of training on earnings of people that have different 
levels of "ability". For example, it may be that the largest returns to training accrue 
to those in the upper tail of the conditional distribution, that is, to the "high-ability" 
workers^ 

Formally, the IVQT model consists of five conditions (some are representations) that 
hold jointly. 

Main Conditions of the Model: Consider a common probability space (Q, F, P) and 
the set of potential outcome variables (Y^d G T>), the covariate variables X, and the 
instrumental variables Z. The following conditions hold jointly with probability one: 

Al Potential Outcomes. Conditional on X and for each d, Y d = q(d,X,U d ), 
where r \-t q(d, X, r) is non- decreasing on [0, 1] and left-continuous and Ud ~ U(0, 1). 

A2 Independence. Conditional on X and for each d, Ud is independent of instru- 
mental variables Z. 

A3 Selection. D := 5(Z, X, V) for some unknown function 5 and random vector V. 

A4 Rank Similarity. Conditional on (X, Z, V), {U d } are identically distributed. 
A5 Observed random vector consists of Y :=Y D , D, X and Z. 

The following is the main econometric implication of the model. 



It is important to note that the quantile index, r, in q(d, x, r) refers to the quantile of potential 
outcome Y d given that exogenous variables are set at X = x and not to the unconditional quantile of Y d . 
For example, suppose that one of the control variables in the earnings example is years of schooling. An 
individual at the 30 th percentile of the distribution of Yd given say 20 years of schooling is not necessarily 
low income as even a relatively low earner with that level of education may still earn above the median 
earnings in the overall population. 
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Theorem 1 (Main Statistical Implication). Suppose conditions A1-A5 hold, (i) Then we 
have for U := Up, with probability one, 

Y — q{D,X, U), U~ U(0,1)\X,Z. (2.2) 

(ii) If \2.2\) holds and t \— > q(d, r) is strictly increasing for each d, then for each r G (0, 1), 
a.s 

P[Y <:q(D,X,r)\X,Z} = T. (2.3) 
(Hi) If \2.2\) holds, then for any closed subset I of [0, 1], a.s. 

P(UeI)^P[Yeq(D, X, I) \X,Z], (2.4) 
where q(d,x,I) is the image of I under the mapping t i-> q(d,x,r). 

The first result states that the main consequence of A1-A5 is a simultaneous equation 
model (12. 2p with non-separable error U that is independent of Z,X, and normalized 
so that U ~ U(0, 1). The second result considers econometric implications when r i— > 
q(D,X,r) is strictly increasing, which requires that Y is non-atomic conditional on X 
and Z. In this case, we obtain the conditional moment restriction (I2.3p . This implication 
follows from the first result and the fact that 

[Y ^ q(D,X, t)} is equivalent to {U ^ r}, 

when q(D, X, r) is strictly increasing in r. The final result deals with the case where Y may 
have atoms conditional on X and Z, e.g. when Y is a count or discrete response variable. 
The first two results were obtained in Chernozhukov and Hansen (2005), and the third 
result is in the spirit of results given in Chesher, Rosen, and Smolinski (2011); Chesher 
(2005); and Chesher and Smolinski (2010). The latter results are related to random 
set/optimal transport methods for identification analysis; see Beresteanu, Molchanov, 
and Molinari (2011); Ekeland, Galichon, and Henry (2010); Galichon and Henry (2009); 
and Galichon and Henry (2011). 

The model and the results of Theorem 1 are useful for two reasons. First, Theorem 1 
serves as a means of identifying the QTE in a reasonably general heterogeneous effects 
model. Second, by demonstrating that the IVQT model leads to the conditional moment 
restrictions ( 12 .3p and ( 12. 4p . Theorem 1 provides an economic and causal foundation for 
estimation based on these restrictions. 
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2.3. The Identification Regions. The conditions presented above yield the following 
identification region for the structural quantile function (d, x, r) i-> q(d, x, r). The identi- 
fication region for the case of strictly increasing r i-> q(d, x, r) can be stated as the set Q 
of functions (d, x, r) \-t m(d, x, r) that satisfy the following relations, for all r G (0, 1] 

P[Y < m(D,X,T)\X,Z] = r a.s. (2.5) 

This representation of the identification region Q is implicit. Nevertheless, statistical 
inference about q G Q can be based on (12.51) and can be carried out in practice using 
weak-identification robust inference as described in Chernozhukov and Hansen (2008), 
Marnier and Sakata (2012), Jun (2008), Santos (2012), or Chernozhukov, Hansen, and 
Jansson (2009). Under conditions that yield point identification, these regions collapse 
to a singleton, and the aforementioned weak-identification-robust inference procedures 
retain their validity. 

The identification region for the case of weakly increasing r h-> q(d, x, r) can be stated 
as the set Q of functions (d,x,r) i— > m(d,x,u) that satisfy the following relations: For 
any closed subset / of (0, 1], 

P(Ue I) ^P[Y em(D,X,I)\X,Z] a.s., (2.6) 

where m(D, X, I) is the image of I under the mapping r i-» m(D,X,T). The inference 
problem here falls in the class of conditional moment inequalities and approaches such as 
those described in Andrews and Shi (2013) or Chernozhukov, Lee, and Rosen (2013), for 
example, can be used. The sets I to be checked could be reduced by determining approx- 
imate core-determining subsets; see Chesher, Rosen, and Smolinski (2011), Galichon and 
Henry (2009), Galichon and Henry (2011) for further discussion. 

2.4. Discussion of the Model. Condition Al imposes monotonicity on the structural 
function of interest which makes its relation to the QTR apparent. Condition A2 states 
that potential outcomes are independent of Z, given X, which is a conventional indepen- 
dence restriction. Condition A3 is a convenient representation of a treatment selection 
mechanism, stated for the purposes of discussion. In A3, the unobserved random vector 
V is responsible for the difference in treatment choices D across observationally identical 
individuals. Dependendence between V and {Ud} is the source of endogeneity that makes 
the conventional exogeneity assumption U ~ Z7"(0, 1)|AT, D break down. This failure leads 
to inconsistency of exogenous quantile methods for estimating the structural quantile 
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function. Within the model outlined above, this breakdown is resolved through the use 
of instrumental variables. 

The independence imposed in A2 and A3 is weaker than the commonly made assump- 
tion that both the disturbances {Ud} in the outcome equation and the disturbances V 
in the selection equation are jointly independent of the instrument Z; e.g. Heckman and 
Robb (1986) and Imbens and Angrist (1994). The latter assumption may be violated 
when the instrument is measured with error as discussed in Hausman (1977) or the in- 
strument is not assigned exogenously relative to the selection equation as in Example 2 
in Imbens and Angrist (1994). 

Condition A4 restricts the variation in ranks across potential outcomes and is key for 
identifying the QTR and associated QTE. Its simplest, though strongest, form is rank 
invariance, when ranks Ud do not vary with potential treatment states d^_ 

U d = U for each d E V. (2.7) 

For example, under rank invariance, people who are strong (highly ranked) earners without 
a training program (d = 0) remain strong earners having done the training (d = 1). 
Indeed, the earnings of a person with characteristics x and rank U = r in the training 
state "0" is Yq = q(0,x,r) and in the state "1" is Y\ = q(l,x,r) j Thus, rank invariance 
implies that a common unobserved factor U, such as innate ability, determines the ranking 
of a given person across treatment states. 

Rank invariance implies that the potential outcomes {Yd} are jointly degenerate which 
may be implausible on logical grounds, as pointed out by Heckman and Smith (1997). 
Also, the rank variables Ud may be determined by many unobserved factors. Thus, it is 
desirable to allow the rank Ud to change across d, reflecting some unobserved, asystematic 
variation. Rank similarity A4 achieves this property while managing to preserve the useful 
moment restriction (12.31) . 

Rank similarity A4 relaxes exact rank invariance by allowing asystematic deviations, 
"slippages" in the terminology of Heckman and Smith (1997), in one's rank away from 
some common level U. Conditional on U, which may enter disturbance V in the selection 



5 Notice that under rank invariance, condition A3 is a pure representation, not a restriction, since 
nothing restricts the unobserved information component V. 

6 Rank invariance is used in many interesting models without endogeneity. See e.g. Doksum (1974), 
Heckman and Smith (1997), and Koenker and Geling (2001). 
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equation, we have the following condition on the slippage; 
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Ud — U are identically distributed across d G V. 



(2.8) 



In this formulation, we implicitly assume that one selects the treatment without knowing 
the exact potential outcomes; i.e. one may know U and even the distribution of slippages, 
but does not know the exact slippages Ud — U. This assumption is consistent with many 
empirical situations where the exact latent outcomes are not known before receipt of 
treatment. We also note that conditioning on appropriate covariates X may be important 
to achieve rank similarity. 

In summary, rank similarity is an important restriction of the IVQT model that allows 
us to address endogeneity. This restriction is absent in conventional endogenous hetero- 
geneous treatment effect models. However, similarity enables a more general selection 
mechanism, A3, and weaker independence conditions on instruments than often are as- 
sumed in nonseparable IV models. The main force of rank similarity and the other stated 
assumptions is the implied moment restriction (12. 3p of Theorem 1, which is useful for 
identification and estimation of the quantile treatment effects. 

2.5. Examples. We present some examples that highlight the nature of the model, its 
strengths, and its limitations. 

Example 1 (Demand with Non-Separable Error). The following is a generalization of 
the classic supply-demand example. Consider the model 



where functions q and p are increasing in the last argument. The function p i— > Y p is 
the random demand function, and p t— > Y p is the random supply function. Additionally, 
functions q and p may depend on covariates X, but this dependence is suppressed. 

Random variable U is the level of demand and describes the demand curve at different 
states of the world. Demand is maximal when U = 1 and minimal when {7 = 0, holding p 
fixed. Note that we imposed rank invariance (12. 7J) , as is typical in classic supply-demand 
models, by making U invariant to p. 



Y p = q(p,U), 

Y p = p(p,z,U) , 

P e{p:p(p,Z,U) = q(p,U)} 



(2.9) 



Conditioning is required to be on all components of V in the selection equation A3. 
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Model (12. 9p incorporates traditional additive error models for demand which have Y p = 
q(p) + e where e = Q t (U). The model is much more general in that the price can affect 
the entire distribution of the demand curve, while in traditional models it only affects the 
location of the distribution of the demand curve. 

The r-quantile of the demand curve p i-» Y p is given by p \-> q(p,r). Thus, the curve 
p i — y Yp lies below the curve p t— > q(p, r) with probability r. Therefore, the various 
quantiles of the potential outcomes play an important role in describing the distribution 
and heterogeneity of the stochastic demand curve. The quantile treatment effect may be 
characterized by dq(p,r)/dp or by an elasticity dhxq(p, r)/d\n.p. For example, consider 
the Cobb-Douglas model q(p,r) = exp (j3(r) + a(r)lnp) which corresponds to a Cobb- 
Douglas model for demand with non-separable error Y p = exp(/3(U) + a(U)hap). The 
log transformation gives In Yp = /3(U) + a(U) hip, and the quantile treatment effect for 
the log-demand equation is given by the elasticity of the original r-demand curve a(r) = 

dQin Yp (t) _ dlng(p,-r) 
d In p dlnp 

The elasticity ol(U) is random and depends on the state of the demand U and may 
vary considerably with U. For example, this variation could arise when the number of 
buyers varies and aggregation induces a non-constant elasticity across the demand levels. 
Chernozhukov and Hansen (2008) estimate a simple demand model based on data from 
a New York fish market that was first collected and used by Graddy (1995). They find 
point estimates of the demand elasticity, a(r), that vary quite substantially from —2 for 
low quantiles to —0.5 for high quantiles of the demand curve. 

The third condition in (3.3), P G {p : p(p,Z,U) = q(p,U)}, is the equilibrium con- 
dition that generates endogeneity; the selection of the clearing price P by the market 
depends on the potential demand and supply outcomes. As a result we have a represen- 
tation that is consistent with A3, P = 5(Z,V), where V consists of U and U and may 
include "sunspot" variables if the equilibrium price is not unique. Thus what we observe 
can be written as 

Y := q(P, U), P := 8{Z, V), U is independent of Z. (2.10) 

Identification of the r-quantile of the demand function, p i— )■ q(p, r) is obtained through 
the use of instrumental variables Z, like weather conditions or factor prices, that shift the 
supply curve and do not affect the level of the demand curve, U, so that independence 
assumption A2 is met. Furthermore, the IVQT model allows arbitrary correlation between 
Z and V. This property is important as it allows, for example, Z to be measured with 
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error or to be exogenous relative to the demand equation but endogeneous relative to the 
supply equation. 

Example 2 (Savings). Chernozhukov and Hansen (2004) use the framework of the IVQT 
model to examine the effects of participating in a 401 (k) plan on an individual's accumu- 
lated wealth. Since wealth is continuous, wealth, Y d , in the participation state d G {0, 1} 
can be represented as 

Y d = q(d,X,U d ), U d ~U(0,l) 
where r i— > q(d,X,r) is the conditional quantile function of Y d and U d is an unobserved 
random variable. U d is an unobservable that drives differences in accumulated wealth con- 
ditional on X under participation state d. Thus, one might think of U d as the preference 
for saving and interpret the quantile index r as indexing rank in the preference for saving 
distribution. One could also model the individual as selecting the 401(k) participation 
state to maximize expected utility: 



D = arg max E 

dev 



W{Y d ,d} X,Z,V =argmax£ W{q(d, x, U d ), d} 

d£T> 



x, z,v 



(2.11) 

where W{Y d , d} is the random indirect utility derived under participation state <i@ As a 
result, the participation decision is represented by 

D = S(Z,X,V), 

where Z and X are observed, V is an unobserved information component that may be 
related to ranks U d and includes other unobserved variables that affect the participa- 
tion state, and function 5 is unknown. This model fits into the IVQT model with the 
independence condition A2 requiring that U d is independent of Z, conditional on X. 

The simplest form of rank similarity is rank invariance (12 .7p . under which the prefer- 
ence for saving vector U d may be collapsed to a single random variable U — Uq — U\. In 
this case, a single preference for saving is responsible for an individual's ranking across all 
treatment states. The rank similarity condition A4 is a more general form of rank invari- 
ance. It relaxes the exact invariance of ranks U d across d by allowing noisy, unsystematic 
variations of U d across d, conditional on (V, X, Z). This relaxation allows for variation 
in rank across the treatment states, requiring only an "expectational rank invariance." 
Similarity implies that given the information in (V, X, Z) employed to make the selection 
of treatment D, the expectation of any function of rank U d does not vary across the treat- 
ment states. That is, ex-ante, conditional on (V,X, Z), the ranks may be considered to 



8 It may depend on both observables in X as well as realized and unrealized unobservables. Only 
dependence on Yd and d is highlighted. 
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be the same across potential treatments, but the realized, ex-post, rank may be different 
across treatment states. 

From an econometric perspective, the similarity assumption is nothing but a restriction 
on the unobserved heterogeneity component which precludes systematic variation of Ud 
across the treatment states. To be more concrete, consider the following simple example 
where 

U d = F v+m {V + r] d ), 

where F v+rid (-) is the distribution function of V+rjd and {77^} are mutually iid conditional 
on V, X, and Z. The variable V represents an individual's "mean" saving preference, 
while r\d is a noisy adjustment! This more general assumption leaves the individual 
optimization problem (12.111) unaffected, while allowing variation in an individual's rank 
across different potential outcomes. 

While we feel that similarity may be a reasonable assumption in many contexts, impos- 
ing similarity is not innocuous. In the context of 401(k) participation, matching practices 
of employers could jeopardize the validity of the similarity assumption. To be more con- 
crete, let Ud = F v+r]d (V + rjd) as before but let r] d = dM for random variable M that 
depends on the match rate and is independent of V, X, and Z. Then conditional on 
V = v, X, and Z, Uq = Fyiy) is degenerate but U\ = Fv+m(v + M) is not. Therefore, U\ 
is not equal to Uq in distribution. Similarity may still hold in the presence of the employer 
match if the rank, Ud, in the asset distribution is insensitive to the match rate. The rank 
may be insensitive if, for example, individuals follow simple rules of thumb such as target 
saving when they make their savings decisions. Also, if the variation of match rates is 
small relative to the variation of individual heterogeneity or if the covariates capture most 
of the variation in match rates, then similarity may be satisfied approximately. 

Example 3 (Discrete Choice Model with Market- Level Data). Berry and Haile (2010) 
show that a general model for market-level data realized from a discrete-choice problem 
can fit within the IVQT model. To keep notation and exposition simple, we consider a 
much-simplified version of the model from Berry and Haile (2010) in which consumer z's 
indirect utility from choosing product j is 

Uijt = u(Xj t , Pjt, ^jt, Vijt) = u(5j(Xj t , £,jt), Pjt, Vijt), 

where t indexes markets, Xj t are observed exogenous product-market characteristics, Pjt 
is the observed price of product j in market t which is treated as endogenous, are 

9 Clearly similarity holds in this case, Ud = Ud' given V, X, and Z. 
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product-market specific unobservables, and Vijt are individual-product-market specific 
unobservables that have density /(•). Thus, the model imposes that unobserved product- 
market specific effects and observed variables Xj t may only affect utility through the index 
fyt = $j{Xjt, Cjt), where Sj(-, •) may differ arbitrarily across products but is the same across 
all markets. That unobserved product characteristics affect utility only through a scalar 
index is a substantive restriction but is common in the literature on discrete choice models 
where, for example, one can interpret the index as an aggregate representing product 
quality. 

An individual will then choose the product that maximizes individual utility. Letting 
Yn denote the observed choice of individual i, we have that 

Y it = aigmaxllijt, 
j<J 

where we assume the same J products are available in each market for simplicity!^! The 
market share of each product will then be given as 

S jt = / l{u(5 jt ,Pjt,v) = maxu(S k t,Pkt,v)}f(v)dv 

J k^J 

:= s j ({6 jt ,P jt }J =1 ) = s j (5 t ,P t ), 

where S t = (S u , 5 Jt )' and P t = (P lt , P Jt )'. 

To fit this model into the instrumental variables quantile regression model, Berry and 
Haile (2010) make several assumptions to produce a structural relationship which is mono- 
tonic in a scalar unobservable. First, they assume that the utility function u(5jt, Pjt, Vijt) 
is strictly increasing in 5j t . This assumption is standard in the discrete choice literature 
and coincides with the interpretation of 6j t as product quality where higher quality prod- 
ucts are associated with higher utility all else equal. Monotonicity of the utility function 
is not sufficient due to the fact that all that is observed is the market share which depends 
on the utility of each potential choice. Thus, Berry and Haile (2010) make an additional 
assumption that they term "connected substitutes." Intuitively, this condition implies 
that an increase in the quality of every good within some strict subset of the available 
choices will be associated with the total market share of all goods not in the subset de- 
creasing as long as the quality of no good outside of the subset increases. Berry and Haile 



10 Obviously, identification of the model requires normalizations. For example, the utility from one 
of the options is generally normalized to zero. As this model is not the focus of this review, we do not 
discuss these normalizations which are discussed in detail in a more general context in Berry and Haile 
(2010). 
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(2010) show that the connected substitutes condition is satisfied in usual random utility 
discrete choice models and that it can hold fairly generally Using these assumptions, 
Berry and Haile (2010) use a result from Gandhi (2008) which shows that the system of 
equations 



has a unique solution for the vector 5t as long as all goods present in equilibrium have 
positive market shares. Thus, we may write 



for some function gj where S t = (S% t , Sj t )'. 

From (I2.12p . we have that 5j(Xj t ,C,jt) = gj(S t ,Pt)- To complete the argument, Berry 
and Haile (2010) assume that the function Sj(Xj t ,^j t ) is strictly increasing in its second 
argument, which represents unobserved product attributes. This condition rules out 
the case where £jt can represent attributes that would increase utility for some individuals 
but decrease utility for others and again corresponds to the notion that £j t represents 
unobserved product quality in which an increase unambiguously makes the product more 
desirable. With the assumed monotonicity in the function 5j, one obtains 



It is also clear that hj(X t , Pt, St) is strictly increasing in Sjt, which is proven in Lemma 5 
of Berry and Haile (2010), from which it follows that 



where S_j t denotes the set of market shares for each product in market t excluding prod- 
uct j and qj is an unknown function that is strictly increasing in £j t . Then, qj can be 
taken as the structural function in the instrumental variables quantile model after the 
normalization that £j t follows a U(0, 1), assuming that £j t has an atomless distribution. 
The model is then completed by assuming the existence of instruments, Z t , that are in- 
dependent of £j t conditional on Xj t and are related to the endogenous variables through 
{S'_j t ,P[)' = A(Z t ,Xj t , V t ) for some function A and unobservables V t . Finally, note that 
the model assumes rank invariance in its construction. 



S jt = Sj(5t,Pt) 




(2.12) 



& = Sj^iSuP&Xjt) = hj(S t ,P t ,X jt ). 




3. The Identifying Power of IV Quantile Restrictions 



The purpose of this section is to examine the identifying power of conditional moment 
restrictions (12. 3p . Specifically, we give various conditions for point identification in this 
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section, summarizing and updating some of the results known in the literature. We 
remark here that point identification is not required in applications in principle as there 
exist inference methods that apply without point identification. However, it is useful to 
know and understand conditions under which moment conditions are informative enough 
that the identification region shrinks to a single point; in such cases the inference methods 
will also produce very informative confidence sets. We present point-identifying conditions 
first for the binary case, D G {0, 1} and Z G {0, 1}, and then consider the case of D taking 
a finite number of values, and finally consider the continuous case. 

3.1. Conditions for point identification in the binary case. Here we consider the 
cases where D G {0,1} and Z G {0,1}. The following analysis is all conditional on 
X = x and for a given quantile r G (0, 1), but we suppress this dependence for ease of 
notation. Under the conditions of Theorem 1, we know that there is at least one function 
q(d) := q(d, x, r) that solves P[Y ^ q(D)\Z] = r a.s. The function q(-) can be equivalently 
represented by a vector of its values q = (q(0) , q(l))' . Therefore, for vectors of the form 
U — {yoiVi)' i we have a vector of moment equations 

U(y) : = ( P[Y ^ y D \Z = 0] - r, P[Y ^ y D \Z = 1] - r )' (3.13) 

where yo '■= (1 — D) ■ yo + D ■ y±. We say that q is identified in some parameter space, C, 
if y = q is the only solution to H(y) = among all y G C 

We require that the Jacobian dU(y) of H(y) with respect to y — (yo,yi)' exists and 
that it takes the form 



dU(y) 



f Y (y \D = 0,Z = 0)P[D = 0\Z = 0] f Y ( Vl \D = 1,Z = 0)P[D = 1\Z = 0] 
My \D = 0,Z = l)P[D = 0\Z = 1] f Y ( yi \D = 1,Z= l)P[D = 1\Z = 1] 

f Y , D (yo,0\Z = 0) f Y , D (yiA\z = o) 



f YtD (y ,o\z = i) f Y , D (yiMz = i) 



(3.14) 



For local identification, we take £ as an open neighborhood of q = (q(0),q(l))'. For 
global identification, we shall use some definitions from Mas-Collell to define C. In what 
follows, for every proper (non-null) subspace L C M 1 , let proj L :M. l t->L denote the per- 
pendicular projection map. A convex, compact polytope is a bounded convex set formed 
by an intersection of a finite number of closed half-spaces. Such a polytope is of full 
dimension in R' if it has a non-empty interior in W. A face of a polytope C is the inter- 
section of any supporting hyperplane of C with C, so that faces of a polytope necessarily 
include the polytope itself. For instance, a rectangle in R 2 has one 2-dimensional face 
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given by itself, four 1- dimensional faces given by its edges, and four 0- dimensional faces 
gives by its vertices. A subspace spanned by a non-empty face of C is the translation to 
the origin of the minimal affine space containing that face. 

Theorem 2 (Identification by Full Rank Conditions). Suppose that H(q) = 0, the support 
of D is {0, 1} and the support of Z is {0, 1}. Assume that the conditional density fy(y\D = 
d, Z = z) exists for each y e M and (d,z) e {0,1} X {0,1}. (i) (Local) Suppose the 
Jacobian dU given by ^3. 14\ ) is continuous and has full rank aty = q, then the r-quantiles 
of potential outcomes, q = (q(0), q(l))' , are identified in the region C given by a sufficiently 
small open neighborhood of q in M 2 . (ii) (Global) Assume that region C contains q and 
can be covered by a finite number of compact convex 2-dimensional polytopes {Cj}, each 
containing q. Assume that for each j, dU is a C 1 Jacobian of II : Cj — > M. 2 , and that, 
possibly after rearranging the rows of dU, for each y G Cj and each subspace L C M 2 
spanned by a face of Cj that includes y, the linear map 

proj L o dH(y) : L i->- L 

has a positive determinant. Then q is identified in C. 

The first result is a simple local identification condition of the type considered in 
Rothenberg (1971) which we provide to fix ideas. The second result is a global iden- 
tification condition which extends the result in Chernozhukov and Hansen (2005) by 
allowing non-rectangular sets C. This result is based on the global univalence theorems 
of Mas-Colell (1979). As explained below, the positive determinant condition requires 
the impact of instrument Z on the joint distribution of (Y, D) to be sufficiently rich. In 
particular, the instrument Z should not be independent of the endogenous variable D. 
We note that existence of the conditional density fy{y\D = d, Z = z) is only required 
for (d,z) in the support of (D,Z). Outside the support we can define the conditional 
density as 0, so the existence condition is not very restrictive. Moreover, the condition 
is formulated so that C can take on relatively rich shapes that can carry useful economic 
restrictions. For instance, in the training context, a useful restriction on the parameters is 
that training weakly increases the potential earning quantiles. This restriction can be im- 
plemented by taking some natural parameter space and intersecting it with the half-space 
H = {(yo,Vi) 6 1R 2 : ?/i ^ yo}. Specifically, a cube C = {y £ 1' : ||y||oo ^ K} intersected 
with the halfspace H is an example of a region C permitted by the global identification 
result (ii). 
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Comment 3.1 (Simple Sufficient Conditions). To illustrate the conditions of the theorem, 
let us consider the parameter space £ as either £ = q + C, i.e. a cube centered at q, or 
C = (q + C) fl H, i.e. intersection of a cube centered at q with the halfspace H. Consider 
the trivial covering of C by itself, i.e. Cj = C. Then the positive determinant condition 
of the theorem is implied by the following simple conditions: 

~T~( — nTy TT > 7—7 — nTy — Tfr for all y = (y , y x ) e C, 3.15 

f YtD [y ,0\Z = 1) / yjD (?/o,0|Z = 0) 

and 

f YtD ( yi ,l\Z=l)>0, f YtD (y o ,0\Z = 0) >0, for ally =(3/0,2/1) G £. (3.16) 

Alternatively, since we can rearrange the rows of 911, which corresponds to reordering 
elements of vector II, the positive determinant condition of the theorem is implied by the 
following simple conditions: 

f Y Avi,l\Z = 1) f Y , n (yi,l\Z = 0) 

7 — 7 prr^ 7T < 7—7 7^ ttt f o r all y = j/o, e £, 3.17 

frAvo, o\z = i) frAvo, o\z = o) 

and 

f Y Avi,l\Z = 0)>0, f Y Avo,0\Z=l)>0, for ally = (y , Vl ) e £. (3.18) 

The proof that these are sufficient conditions is given in the appendix, and below we 
discuss the economic plausibility of these conditions. 

Comment 3.2 (Plausibility of (13.151) and (I3.16P ). The condition (13 . 16[) seems quite mild, 
so we focus on (13.151) . We can illustrate (13.151) by considering the problem of evaluating 
a training program where Y's are earnings, -D's G {0, 1} are training states, and Z's 
G {0, 1} are offers of training service. Condition (13.151) may be interpreted as a monotone 
likelihood ratio condition. That is, the instrument Z should have a monotonic impact 
on the likelihood ratio specified in (I3.15p . This monotonicity may be a weak condition 
in some contexts and a strong condition in others. For instance, if £ is a cube q + C, 
then this condition may be considered relatively strong. On the other hand, if we impose 
monotonicity of the training impact on earning quantiles, so that q(0) ^ <?(1), i-e. q G C = 
(q + C) H H , then condition (I3.15P would be trivially satisfied in many empirical settings. 
Indeed, it would suffice that the instrument Z, the offer of training services, increases the 
relative joint likelihood of receiving higher earnings and receiving the training service. In 
many instances, we also have P[D = 1\Z = 0] = 0; e.g. those not offered training services 
do not receive that training. When P[D — l\Z — 0] = 0, the right-hand side of ( 13. 15ft 
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equals which makes the identification condition (I3.15P satisfied trivially even for the less 
convenient parameter sets such as C = q + C. 

3.2. Identification with Multiple Points of Support. We generalize the result of 
Theorem 2 to more general discrete treatments with discrete instruments. Consider the 
case when D has the support {1, /} and Z has the support {1, r} (I ^ r < oo). Note 
that function q(-) can be represented by a vector q = (g(l), <?(/))' £ M. 1 . Under the 
conditions of Theorem 1, there is at least one function q(d) that solves P[Y ^ q(D)\Z] = 
t a.s. Therefore, for vectors of the form y = (yi, yi)' and the vector of moment equations 

U{y) = (P\r^y D \Z = z]-r, z = l,...,r)', (3.19) 

where yr> := J2 d 1[D = d] ■ y^, the model is identified if y = q uniquely solves H(y) = 0. 

We define matrix dU(y) as the rxl matrix with (d, z) element given by f Y {yd\D = d, Z = 
z)P[D = d\Z = z] where z = 1, ...,r and d = 1, ...,/. We require this to be the Jacobian 
matrix of the map y h- > H(y) and impose full-rank-type conditions on submatrices of this 
Jacobian. To this end, let m denote any permutation of I distinct integers from {1, r}, 
called /-permutations, and M. be a collection of all such permutations. Let U m := (Jlj)j em , 
which maps M. 1 to MVbe a subvector of II formed by selecting j-th elements of II according 
to their order in mo Let dU m denote the corresponding / x / Jacobian matrix of II m . 
The following theorem generalizes Theorem 2. 

Theorem 3 (Identification for Discrete D). Suppose 11(g) = 0, the support of D is 
{1, ...,/} and of Z is {l,...,r}. Assume that the conditional density fy{y\D = d, Z = z) 
exists for each t/Gl, and (d,z) £ {1, ...,/} x {1, ...,r}. (i) (Local) Suppose the Jacobian 
dU(y) defined above is continuous and has rank I at y = q. Then the r-quantiles of 
potential outcomes, q, are identified in the region C given by a sufficiently small open 
neighborhood of q in M. 1 . (ii) (Global) Assume that region C contains q and can be covered 
by a finite number of compact convex l-dimensional polytopes {Cj}, each containing q and 
having the following properties: For each j there is an l-permutation m(j) £ Ai, such 
that <9n m (j) is the C 1 Jacobian ofIl m ^ : Cj — > M. 1 , and for each y £ Cj and each subspace 
L C l' spanned by a face of Cj that includes y, the linear map 

proj L o dU m{j) (y):L^L 

has a positive determinant. Then q is identified in L. 



11 Note that this formulation allows reordering elements of II which may be needed to achieve the 
required positive determinant condition as discussed in the binary case. 
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We note that in the theorem existence of the conditional density /y(y|D = d, Z — z) is 
only required for (d, z) in the support of (D, Z). This density can be defined to take on an 
arbitrary value for (d, z) outside the support. The first result is a simple local identification 
condition provided to fix ideas. The second result is a global identification condition based 
on Global Univalence Theorem 1 of Mas-Colell (1979). This result complements a similar 
result given in Chernozhukov and Hansen (2005) based on Global Univalence Theorem 2 of 
Mas-Colell (1979). The positive determinant condition requires the impact of instrument 
Z on the joint distribution of (Y, D) to be sufficiently rich. 

Comment 3.3 (An Alternative Sufficient Condition). Here we recall an alternative suf- 
ficient condition from Chernozhukov and Hansen (2005), which is based on the Global 
Univalence Theorem 2 of Mas-Colell (1979). Assume that region C contains q and can be 
covered by a finite number of compact convex /-dimensional sets {Cj}, each containing 
q and having the following properties: (i) For each j, there is a permutation m(j) G M. 
such that dn m (j) is C l Jacobian of H m (j) '■ Cj ->■ R l ; (ii) for each y G Cj, 

det[dU m{j) (y)\ > 0; 

(iii) Cj possesses a (^-smooth boundary dCj\ and (iv) for each y G dCj, l'(dU m Q)(y) + 
dU m (j)(y)')l > for each I G T(y) : I ^ where T(y) is the subspace tangent to Cj 
at point y. Then q is identified in C. This condition seems to require slightly stronger 
conditions on the boundary than the condition used in Theorem 3. The advantage of the 
conditions from Chernozhukov and Hansen (2005) is that they more transparently convey 
the full-rank nature of the conditions imposed. 

3.3. Identification with general D. Finally we consider conditions for point identi- 
fication in the case of more general D and Z that may take on a continuum of values. 
We let d denote elements in the support of D and z denote elements in the support 
of Z. Without loss of much generality, we restrict attention to the case where both Y 
and D have bounded support. We require the parameter space £ to be a collection of 
bounded (measurable) functions m : M fc t— > R containing q(-). We say that q(-) such 
that P[Y ^ q(D)\Z] = t a.s. is identified in C if for any other m(-) G C such that 
P[Y ^ m(D)\Z] = t a.s., m(D) = q(D) a.s. Below, we use || • || Pj p to denote the L P (P) 
norm. 

Theorem 4 (Identification with General D). Suppose that P[Y ^ q(D)\Z] = t a.s. and 
both Y and D have bounded support. Consider a parameter space C which is a collection 
of bounded (measurable) functions m : R fc i— > R containing q(-). Assume that for e : = 
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Y — q(D) the conditional density f e (e\D , Z) exists for each e G K, a.s. (i) (Global) Suppose 
that for each A(d) := m(d) - q(d) with m(-) G C, co A (D, Z) := f e (5A{D)\D, Z)d5 > 
a.s. and 

E [A(D) ■ uja(D, Z)\Z] = a.s. =>- A(D) = a.s. (3.20) 

Then q(-) is identified in L. (ii) (Local) Suppose that uq(D,Z) := f t (0\D,Z) > a.s. 
and for each A(d) := m(d) — q(d) with m(-) G C, 

E [A(D) ■ co (D, Z)\Z] = a.s. A(D) = a.s., (3.21) 

and, for some ^ r] < 1 and 1 ^ p, 

\\E [A(D) ■ {u A {D, Z) - u (D, Z)}\Z) \\ P . P ^ V \\E [A{D) ■ u {D, Z)\Z) \\ P , P . (3.22) 

Then g(-) is identified in C. 

Condition (i), mentioned in Chernozhukov and Hansen (2005), states a non- linear 
bounded completeness condition for global identification. The condition (I3.20p required is 
not primitive, but it highlights a useful link with the linear bounded completeness condi- 
tion: E [A{D)\Z] = a.s. A(D) = a.s. used by Newey and Powell (2003). The latter 
condition is needed for identification in the mean IV model E[Y — q(D)\Z] = under the 
assumption of a bounded structural function q. The latter condition is known to be quite 
weak, as shown in D'Haultfoeuille (2011), and there are many primitive sufficient condi- 
tions that imply this condition. Andrews (2011) shows that linear completeness is generic 
under some conditions. Although condition (I3.20p is not primitive, it is not vacuous ei- 
ther since the previous theorems provide primitive conditions for its validity. The local 
identification condition (ii), obtained by Chen, Chernozhukov, Lee, and Newey (2011), 
provides yet another sufficient condition for condition (i). The result (ii) replaces the non- 
linear completeness condition (13.201) by the linear completeness condition (I3.2ip which is 
easier to check. The result (ii) also implicitly requires that the set £ is a sufficiently small 
neighborhood of q and that functional deviations m(-) — q(-) and the conditional density 
f e (-\D, Z) are sufficiently smooth. This is explained in detail in Chen, Chernozhukov, 
Lee, and Newey (2011) where further primitive smoothness and completeness conditions 
are also provided. 

4. Other Approaches to Quantile Models with Endogeneity 

There are, of course, other sets of modeling assumptions that one could employ to 
build a quantile model with endogeneity. In this section, we briefly outline two other 
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approaches that have been taken in the literature. The first, due to Abadie, Angrist, and 
Imbens (2002), extends the local average treatment effect (LATE) framework of Imbens 
and Angrist (1994) to quantile treatment effects. The second, considered in Imbens and 
Newey (2009) and Lee (2007), uses a triangular structure to obtain identification. 

4.1. Local Quantile Treatment Effects with Binary Treatment and Instrument. 

In fundamental work, Abadie, Angrist, and Imbens (2002) develop an approach to esti- 
mating quantile treatment effects within the LATE framework of Imbens and Angrist 
(1994) in the case where both the instrument and treatment variable are binary. The 
use of the LATE framework makes this approach appealing as many applied researchers 
are familiar with LATE and the conditions that allow identification and consistent es- 
timation of this quantity. Importantly, the extension proceeds under exactly the same 
monotonicity requirement as needed for LATE. 

Specifically, Abadie, Angrist, and Imbens (2002) show that the QTE for a subpopulation 
is identified if 

1. (Independence) the instrument Z is independent of the potential outcome errors, 
{Ud}, and the errors in the selection equation, V; 

2. (Monotonicity) P{D\ > D \X) = 1 where Di is the treatment state of an individ- 
ual when Z — 1 and Do is defined similarly, holds; 

3. and other standard conditions are met. 

The subpopulation for whom the QTE is identified is the set of "compilers," those in- 
dividuals with Di > D . In other words, the compilers are the set of individuals whose 
treatment is altered by switching the instrument from zero to one. Monotonicity is key in 
this framework. The monotonicity condition rules out "defiers," individuals who would 
receive treatment in the absence of the intervention represented by the instrument but 
would not receive treatment if placed into the treatment group. The effects for individuals 
who would always receive treatment or never receive treatment regardless of the value of 
the instrument are unidentified. 

Looking at these conditions, we see that the model of Abadie, Angrist, and Imbens 
(2002) replaces the monotonicity assumption (Al), the independence assumption (A2), 
and the similarity assumption (A4) with a different type of monotonicity and a stronger 
independence assumption and identifies a different quantity: the QTE for compilers. The 
LATE-style approach has not yet been extended beyond cases with a binary treatment and 
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a single binary instrument while the instrumental variable quantile model of Chernozhukov 
and Hansen (2005) applies to any endogenous variables and instruments. Note that neither 
set of conditions nests the other, and neither framework is more general than the other. 
Thus, the frameworks are best viewed as complements, providing two sets of conditions 
that can be considered when thinking about a strategy for estimating heterogeneous 
treatment effects. 

Of course, the two sets of conditions may be mutually compatible. One such case is 
discussed in Chernozhukov and Hansen (2004). In this example, the pattern of results 
obtained from the two estimators is quite similar, and the difference between the estimates 
appears small relative to sampling variation. Further exploration of these two approaches 
and their similarities and differences may be interesting to consider. 

4.2. Instrumental Variables Quantile Regression in Triangular Systems. An- 
other compelling framework is based on assuming a triangular structure as in Imbens 
and Newey (2009). See also Chesher (2003), Koenker and Ma (2006), and Lee (2007) for 
related models and results. The triangular model takes the form of a triangular system 
of equations 

Y = g(D,e), 
D = h(Z,rj), 

where Y is the outcome, D is a continuous scalar endogenous variable, e is a vector of 
disturbances, Z is a vector of instruments with a continuous component, r\ is a scalar 
reduced form error, and we ignore other covariates for simplicity. It is important to note 
that the triangular system generally rules out simultaneous equations which typically have 
that the reduced form relating D to Z depends on a vector of disturbances. For example, in 
a supply and demand system, the reduced form for both price and quantity will generally 
depend on the unobservables from both the supply equation and the demand equation. 
Outside of r\ being a scalar, the key conditions that allow identification of quantile effects 
in the triangular system are 

1. (Monotonicity) The function rj i— > h(Z,i]) is strictly increasing in rj, and 

2. (Independence) D and e are independent conditional on V for some observable or 
estimable V. 

The variable V is thus the "control function" conditional on which changes in D may 
be taken as causal. Imbens and Newey (2009) use V = F D \ z (d,z) = F v (rj), where F v (-) 
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represents the CDF of r], as the control function and show that this variable satisfies the 
independence condition under the additional condition that (e, rf) is independent of Z. 
They show that one may use D = h(Z,rj) to identify V under the assumed monotonicity 
of h(Z, rj) in rj. Using V obtained in this first step, one may then construct the distribution 
of Y\D, V. Then integrating over the distribution of V and using iterated expectations, 
one has 



It then follows that the r th quantile of Y d is G 1 (r, d). 

As with the framework of Abadie, Angrist, and Imbens (2002), the triangular model 
under the conditions given above is neither more nor less general than the model of Cher- 
nozhukov and Hansen (2005). The key difference between the approaches is that Cher- 
nozhukov and Hansen (2005) uses an essentially unrestricted reduced form but requires 
monotonicity and a scalar disturbance in the structural equation. The triangular system 
on the other hand relies on monotonicity of the reduced form in a scalar disturbance. In 
addition, the triangular system, as developed in Imbens and Newey (2009), requires a 
more stringent independence condition in that the instruments need to be independent of 
both the structural disturbances and the reduced form disturbance. That the approaches 
impose structure on different parts of the model makes them complementary with a re- 
searcher's choice between the two being dictated by whether it is more natural to impose 
restrictions on the structural function or the reduced form in a given application. 

The triangular model and the model of Chernozhukov and Hansen (2005) can be made 
compatible by imposing the conditions from the triangular model on the reduced form 
and the conditions from Chernozhukov and Hansen (2005) on the structural model. Tor- 
govitsky (2012) considers identification and estimation when both sets of conditions are 
imposed and shows that the requirements on the instruments may be substantially relaxed 
relative to Chernozhukov and Hansen (2005) or Imbens and Newey (2009) in this case. 




Pr(g(d,e)<y):=G(y,d). 



5. Estimation and Inference 



In the previous sections, we have outlined results that are useful for identifying quantile 
treatment effects and structural functions that are monotonic in a scalar unobservable. 
In the following, we briefly review the literature on estimation and inference. We focus 



24 



CHERNOZHUKOV HANSEN 



on estimation of the model of Chernozhukov and Hansen (2005) presented in Section 2 
using the moment conditions derived in Theorem 1. For estimation of the triangular 
model, see Imbens and Newey (2009) for nonparametric estimation and Lee (2007) for 
a semiparametric approach. Abadie, Angrist, and Imbens (2002) provides results for 
estimating the QTE for compilers within the LATE-style framework. Also, we only review 
approaches for estimating parametric quantile functions: q(D,X,r) = g(D, X,t; 9) for 
9 E C R m . Horowitz and Lee (2007) and Gagliardini and Scaillet (2012) present 
nonparametric estimation and inference results for the IVQT model using condition (12. 3p . 

There are two practical issues that make estimation and inference based on condition 
(12. 3p challenging. The first is that the sample analog to condition (12. 3p is non-smooth, 
and the GMM objective function that would be formed by using (12. 3p as the moment 
conditions is also generically non-convex, even for linear quantile models. The second 
problem is that the model may suffer from weak identification as in the standard linear 
IV model; Stock, Wright, and Yogo (2002) provides a useful introductory survey to weak 
identification and related inference methods in the linear IV model. In the quantile case, 
the problem of weak identification is more subtle than in the linear model in that some 
quantiles may be weakly identified while others may be strongly identified. The relevant 
object for defining the strength of identification of a given quantile is the covariance 
between D and Z weighted by the conditional density function of the unobservable at 
the given quantile. See Chernozhukov and Hansen (2008) for a formal definition of this 
object and related discussion. 

While the non-smoothness and non-convexity of the GMM criterion complicates opti- 
mization, it does not render the approach infeasible, especially when the dimension of D 
and X is not too large. Abadie (1997) considered this approach for estimating an income 
model and provides further discussion. One could also estimate the model parameters 
using the Markov Chain Monte Carlo (MCMC) approach of Chernozhukov and Hong 
(2003). This approach bypasses the need for optimization, instead relying on sampling 
and averaging to estimate model parameters. Note that this approach is not a cure-all 
since MCMC requires careful tuning in applications. It is also worth noting that standard 
samplers may perform poorly in even simple linear instrumental variables models when 
identification is not strong; see Hoogerheide, Kaashoek, and van Dijk (2007). In an ap- 
proach related to optimizing the GMM criterion function directly, Sakata (2007) proposes 
estimating the parameters of an instrumental variables quantile model by optimizing a 
different non-smooth, non-convex criterion function. 
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To partially circumvent the numerical problems in optimizing the full GMM criterion, 
Chernozhukov and Hansen (2006) suggest a different procedure termed the inverse quantile 
regression for the linear quantile model q(D, X, t) = D'a(T)+X'/3(r). The basic intuition 
for the inverse quantile regression comes from the observation that if one knew the true 
value of the coefficient on D, a(r), the r th quantile regression of Y — D'a(r) onto X 
and Z would yield zero coefficients on the instruments Z. This observation allows one 
to effectively concentrate j3(r) out of the problem and leaves a non-smooth, non-convex 
optimization problem over only the parameters a{r). Since D is low-dimensional in 
many applications, one can usually solve this optimization problem using highly robust 
optimization procedures such as a grid-search. 

Algorithmically, the inverse quantile regression estimates for a given probability index 
t of interest can be obtained as follows using a grid search over a{r): 

1. Define a suitable set of values {(Xj,j = 1, J}, and estimate the coefficients /3(aij, r) 
and 7(aij, t) from the model Y—D'cXj = X'/3(aij, r) + Z' / ~f(aj, r)+e by running the ordinary 
r-quantile regression of Y — D'oij on X and Z. Call the estimated coefficients /3(aij,r) 
and j(aij, t). 

2. Save the inverse of the variance-covariance matrix of 7(0,-, r), which is readily avail- 
able in any common implementation of the ordinary QR. Denote this variance-covariance 
matrix A(aj,r). Form W n (aj,r) = ^y(aj,T)'A(aj,T)^ 1 ^y(aj,T). Note W n (oij) is the Wald 
statistic for testing 7(0,, r) = 0. 

3. Choose a(r) as a value among {etj,j = 1,..., J} that minimizes W n (a,r). The 
estimate of f3(r) is then given by f3(a(r), r). 

Chernozhukov and Hansen (2006) and Chernozhukov and Hansen (2008) provide con- 
ditions under which the resulting estimator for a(r) and j3(r) is consistent and asymp- 
totically normal and provide a consistent variance estimator. Marmer and Sakata (2012) 
provide a similar multi-step algorithm that circumvents the same numeric problems using 
the objective function of Sakata (2007). 

The good behavior of the asymptotic approximations obtained in Chernozhukov and 
Hansen (2006) and Chernozhukov and Hansen (2008) rely on strong identification of 
the model parameters just as in the linear IV case. Intuitively, strong identification for a 
quantile of interest requires that a particular density-weighted covariation matrix between 
D and Z is not local to being rank deficient and that the impact of Z is rich enough 
to guarantee that the moment equations have a unique solution. The first condition is 
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analogous to the usual full rank condition in linear IV analysis, and the second condition is 
required because of the nonlinearity of the problem. Checking these conditions in practice 
may be difficult, and it is therefore useful to have inference procedures that are robust to 
violations of these conditions. 

Fortunately, there are several inference procedures that remain valid under weak iden- 
tification. A nice feature of the algorithm defined for estimating a(r) above is that 
it produces a weak-identification-robust inference procedure naturally as a byproduct. 
Chernozhukov and Hansen (2008) show that the Wald statistic, W n (a,r) converges in 
distribution to X% m (z) under the null that a = cto where we let ao denote the true value 
of a(r) without needing either of the conditions discussed in the preceding paragraph. 
Thus a valid (1 — p)% confidence region for a(r) may be constructed as the set: 

{a:W n (a,r) ^ Cl _ p } (5.23) 

where ci_ p is such that Pr(Xdim(z) > c i-p) = Pi anc ^ the se ^ * s approximated numerically 
by considering en's in the grid {atj,j = 1, J}. Chernozhukov and Hansen (2008) show 
that confidence region in equation ( 15.231) is valid when the model parameters are strongly 
identified and remains valid when the model is weakly identified or even unidentified. 
Marmer and Sakata (2012) provide a similar procedure and result for their procedure 
as well. Jun (2008) provides yet a different approach to performing weak-identification- 
robust inference in models defined by conditions (12. 3p . Finally, Chernozhukov, Hansen, 
and Jansson (2009) show that one can form statistics for inference about the entire param- 
eter vector 9 that are condtionally pivotal in finite-samples for models defined by quantile 
restrictions such as (12.31) . Since the statistics do not depend on unknown nuisance param- 
eters in finite samples, the exact distributions of these statistics can be calculated and 
inference can proceed without relying on asymptotic approximations or statements about 
the strength of identification. The distributions produced in Chernozhukov, Hansen, and 
Jansson (2009) are not standard and so must be calculated by simulation. 

6. Conclusion and Directions for Future Research 

In this paper, we have reviewed approaches for building quantile models in the pres- 
ence of endogeneity, focusing on conditions that can be used for identification. We have 
also briefly reviewed some of the practical issues that arise in estimation of instrumental 
variables quantile models and approaches to dealing with these issues. The models and 
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estimation strategies outlined and cited in this review have already seen use in empiri- 
cal economics where they have mostly been used for their ability to uncover interesting 
distributional effects. In this review, we have also noted that the identification strategy 
employed in this paper can be used to uncover structural objects even if quantile effects 
are not the chief objects of interest as in Berry and Haile (2010). 

While the results reviewed in this paper are useful in a variety of contexts, there remain 
interesting areas for research in quantile models with endogeneity. In some applications, 
features of the conditional distribution are not the chief objects of interest and researchers 
are interested in effects of treatments on unconditional quantiles. Given the set of con- 
ditional quantiles, such unconditional effects may be uncovered. In recent work, Froelich 
and Melly (2008) propose a different approach, related to Abadie, Angrist, and Imbens 
(2002), to estimating structural effects of endogenous variables on unconditional quantiles 
directly. It would also be interesting to think about quantile-like quantities for multivari- 
ate outcomes with endogenous covariates. The results reviewed in this paper offer one 
possible approach for quantile modeling with endogeneity, but there remain many inter- 
esting directions and other approaches to be explored in further research. 



Appendix A. Proofs 

A.l. Proof of Theorem 1. Conditioning on X = x is suppressed. For P almost every 
value z of Z, 

P [U D ^ t\Z = z] = J P[U D ^ t\Z = z, V = v] dP [V = v\Z = z\ 

= [ P [U 5{z , v) <^t\Z = z,V = v] dP [V = v\Z = z] 

J (A.24) 

J P[U ^ t\Z = z,V = v]dP[V = v\Z = z] 



(3) 



W U\TT s \V 1 ^ 
= P[Uo ^ T\Z — Z\ — T. 

Equality (1) is by definition. Equality (2) is by the representation A3. Equality (3) is 
by the similarity assumption A4 and representation A3: Conditional on (V — v, Z — z), 
D = 5(z,v) is a constant, so that by A4, Us( ZjV ) has the same distribution as U , where 
"0" denotes any fixed value of D. Equality (4) is by definition, and equality (5) is by the 
independence assumption A2. This shows the first result. 



28 



CHERNOZHUKOV HANSEN 



The second result follows from the first and the equivalence of the events {q{D, U) ^ 
q(D, r)} = {U ^ r} under u i-> q(d, u) strictly increasing for each d on the domain [0, 1]. 
To show the third result we note that 

{UeI}C {{u : q(D, u) = q(D, U)}nl^®}. 

Since Y = q(D,U), the latter event is equivalent to the event {Y G q(D,I)}, where 
q(D,I) denotes the image of I under the mapping u q(D,u). The third result then 
follows from the first result. □ 

A. 2. Proof of Theorems [2] and [31 The local identification results follow by a standard 
argument, introduced in Rothenberg (1971), which we omit for brevity. The global iden- 
tification result is obtained as follows. By assumption q G C. Hence, we need to check 
whether y = q is the only solution to H(y) = over C. Consider a covering set Cj and 
the /-permutation m(j) corresponding to it, as defined in the theorem. By assumption 
H m (j)(q) = 0. By assumption q G Cj. The stated rank conditions, compactness, and 
convexity of the polytope Cj imply that the mapping y — >■ H m (j)(y), which maps Cj C M 1 
to M. 1 , is a homeomorphism (one-to-one) between Cj and U m ^(Cj) by the global univa- 
lence theorem, Theorem 1 of Mas-Colell (1979). Thus, y = q is the unique solution of 
H m (j\(y) = over Cj. Since this argument applies to every j and {Cj} cover C, it follows 
that y = q is the unique solution of U(y) = over C. □ 

A. 3. Proof of Theoremdl We have that q solves P\Y ^ q(D)\Z] = r a.s., and q G C by 
assumption. Hence we need to check whether q is the only solution to P[Y ^ q(D)\Z] = t 
a.s. in C. Suppose there is m G C such that P[Y ^ m(D)\Z] = r a.s. Define A(d) : = 
m(d) — q(d), and write 

P[Y ^ m(D)\Z] - P[Y <: q(D)\Z] ^ E[E[ / f e (SA(D)\D, Z)A(D)d5\D, Z]\Z] 

Jo 

= E[ f e {6A{D)\D,Z)A{D)d6\Z] ( A - 25 ) 
Jo 

= E[A(D) ■ cua(D, Z)\Z\. 

Noting that (1) follows by the fundamental theorem of calculus, (2) by the law of iterated 
expectations, and (3) by linearity of the Lebesgue integral. For uniqueness we need that 
(1A.25j) =0 a.s. =>■ A(D) = a.s., which is assumed. The result (i) follows. 

Result (ii) is immediate from (i) by the triangle inequality for || • \\ Pt p. □ 
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A.4. Proof of Sufficiency of (13351) and (I3A61) . Here show that (13151) and (l3~T6j) 

are sufficient for identification over the parameter space £ = (q + C) fl H . Note that in 
general £ has at most up to five edges: the left and the right edges, parallel to each other, 
the top and the bottom edges, also parallel to each other, and the edge generated by the 
intersection of £ with the 45 degree line. Let e\ and e 2 be coordinate vectors in M 2 , and 
let L k denote the various subspaces spanned by faces of £ containing y. In particular, we 
have that L\ := M 2 for all y in the two-dimensional face F\ := £, L 2 := span(e2) for all y 
in the one- dimensional faces given by the left and the right edge of £, denoted both by F 2 , 
L3 := span(ei) for all y in the two-dimensional faces given by the top and bottom edges 
of £, denoted both by F 3 , and L 4 = span(ei + e 2 ) for all y in the one-dimensional face F 4 
given by the edge generated by the intersection of £ with the 45 degree line. The subspaces 
spanned by vertices, which are zero- dimensional faces of £, are null spaces; so we do not 
need to consider them. We compute the projections of the Jacobian map onto these 
subspaces: proj Ll o 0L%)[Z] = 0L%)Z, proj La o dU (y)[l] = f Y>D (y ,0\Z = 0)1, proj La 

01%) [Z] = UAvuMz = 1)1, proj L4 o0i%)[z] = {[f Y)D {yiA\z = 1) + f YlD (voAz = 

1) + l\Z = 0) + fy >D (yo, 0\Z = 0)]/2}Z, for y G Ft and Z G L k in each of the cases. 

We then compute the corresponding determinants of the maps 

proj Lfe o0I%) : L k -> L k , 

where determinants are computed with respect to the coordinate systems of L k , as 
det[0I%)] for k = 1, UAVo, 0\Z = 0) for k = 2, f Y>D ( yi , 1\Z = 1) for k = 3, [f Y , D (Vu l\Z = 
1) + frAvo,0\Z = 1) + UAVU MZ = 0) + /y, D (y ,0|Z = 0)]/2 for fe = 4. Theorem 2 
requires that these determinants are positive for values of y G F k . This condition is im- 
plied by the simpler conditions (13 . 1 5[) and (13.161) . For the case of £ = q + C, verification 
is analogous except that we do not need to consider L4. Thus, the positive determinant 
condition of Theorem 2 is implied by the conditions (13. 15)) and (I3.16P for £ = q + C as 
well. □ 
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